Skip to content

feat: add chain sync protocol for block history synchronization#84

Open
0u-Y wants to merge 2 commits into
StabilityNexus:mainfrom
0u-Y:feature/chain-sync
Open

feat: add chain sync protocol for block history synchronization#84
0u-Y wants to merge 2 commits into
StabilityNexus:mainfrom
0u-Y:feature/chain-sync

Conversation

@0u-Y
Copy link
Copy Markdown

@0u-Y 0u-Y commented Mar 31, 2026

Addressed Issues:

Fixes #83

Screenshots/Recordings:

Before (without Chain Sync):
New node joins after Block #1 is mined — only genesis block is synced:

Node A — Chain length: 2 blocks (Block #0, Block #1)
Node C — Chain length: 1 block  (Block #0 only)

After (with Chain Sync):
New node receives full block history on connection:

Node A — Chain length: 6 blocks (Block #0 ~ Block #5)
Node B — Chain length: 6 blocks (Block #0 ~ Block #5)

Log output from new node:

🔄 Synced account c03ecbb09cb5... (balance=350)
🔄 Accepted state sync from 127.0.0.1:9000 — 2 accounts
📡 Requesting blocks 1~5 from 127.0.0.1:9000
✅ Chain synced: added 5 blocks

Additional Notes:

What this PR does:
Adds a minimal 3-message handshake protocol for block history synchronization when new nodes join the network. Inspired by Bitcoin's Blocks-First IBD, simplified for MiniChain's minimal philosophy.

  • status — share current chain height on peer connection
  • get_blocks — request a range of blocks by height
  • blocks — respond with the requested serialized blocks

Design decisions:

  • Bitcoin IBD uses getblocks → inv → getdata → block (4 messages) for hash-based fork detection and inventory filtering. MiniChain's linear chain makes height-based range requests sufficient, reducing the protocol to 3 messages.
  • Received blocks are validated for chain linkage (validate_block_link_and_hash()) only. Account state relies on the existing sync message to avoid double-applying transactions.
  • Reference: Marabu Blockchain Foundations §5.6 (block validation), §6.3 (honest convergence)

Files changed:

  • minichain/chain.py — added height property, get_blocks_range(), add_blocks_bulk()
  • minichain/p2p.py — added status, get_blocks, blocks message types with validators
  • main.py — added status exchange in on_peer_connected(), added message handling in handler()

Known limitation:

  • Concurrent sync from multiple peers may cause duplicate block addition attempts, which are safely rejected by validate_block_link_and_hash().
  • Current design uses height-based block requests (linear chain assumption). Will need hash-based requests when Fork Choice is implemented.

Tested scenarios:

  1. New node syncs 1 block from existing node ✅
  2. New node syncs 5 blocks in a single batch ✅
  3. Already-synced nodes connect without unnecessary requests ✅
  4. Both nodes at height=0 — no sync triggered ✅

AI Usage Disclosure:

Check one of the checkboxes below:

  • This PR does not contain AI-generated code at all.
  • This PR contains AI-generated code. I have read the AI Usage Policy and this PR complies with this policy. I have tested the code locally and I am responsible for it.
    I have used the following AI models and tools: Claude (Anthropic) — used for design discussion and code review. All code was reviewed, tested, and verified locally before submission.

Checklist

  • My PR addresses a single issue, fixes a single bug or makes a single improvement.
  • My code follows the project's code style and conventions
  • If applicable, I have made corresponding changes or additions to the documentation
  • If applicable, I have made corresponding changes or additions to tests
  • My changes generate no new warnings or errors
  • I have joined the Discord server and I will share a link to this PR with the project maintainers there
  • I have read the Contribution Guidelines
  • Once I submit my PR, CodeRabbit AI will automatically review it and I will address CodeRabbit's comments.
  • I have filled this PR template completely and carefully, and I understand that my PR may be closed without review otherwise.

Summary by CodeRabbit

Release Notes

  • New Features
    • Peers can now compare chain heights and automatically synchronize missing blocks
    • Added block range queries and bulk import capabilities
    • Improved peer coordination with new status and block synchronization messages

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 31, 2026

Warning

Rate limit exceeded

@0u-Y has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 22 minutes and 6 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7c639009-cd4a-4b6b-9a5b-79f479445ba1

📥 Commits

Reviewing files that changed from the base of the PR and between 3ce575c and 54540b6.

📒 Files selected for processing (3)
  • main.py
  • minichain/chain.py
  • minichain/p2p.py

Walkthrough

Implemented a three-message P2P protocol (status, get_blocks, blocks) enabling nodes to exchange heights, request missing block ranges, and apply received blocks. Added blockchain methods for height, range retrieval, and bulk block application, extended P2P validators, injected per-connection writers, and updated connection handshake to send status.

Changes

Chain Sync

Layer / File(s) Summary
P2P messages, validators, and writer injection
minichain/p2p.py
SUPPORTED_MESSAGE_TYPES now includes status, get_blocks, blocks. Added _validate_status_payload, _validate_get_blocks_payload, _validate_blocks_payload, wired them into _validate_message, and inject _writer into incoming message dicts for responses.
Blockchain range APIs and bulk apply
minichain/chain.py
Added Blockchain.height property, get_blocks_range(from_height,to_height) returning serialized block dicts (inclusive, clamped), and add_blocks_bulk(block_dicts) which deserializes, validates linkage/hash sequentially under lock, appends accepted blocks, and returns (success, added_count).
Network handler: status/get_blocks/blocks handling and handshake
main.py
On peer connect, send status (local height) after sync. In handler, implement status branch to compare heights and request missing blocks via _writer, get_blocks branch to fetch a range and reply with blocks, and blocks branch to call chain.add_blocks_bulk() and log results.

Sequence Diagram(s)

sequenceDiagram
  participant NodeA as Node A (peer, higher height)
  participant NodeB as Node B (new/behind)
  participant ChainB as NodeB.chain
  participant Writer as NodeB._writer

  NodeA->>NodeB: status(height=HA)
  NodeB->>NodeA: status(height=HB)
  NodeB->>NodeA: get_blocks(from=HB+1,to=HA)  -- via Writer
  NodeA->>NodeA: chain.get_blocks_range(HB+1,HA)
  NodeA->>NodeB: blocks([B...])  -- via Writer
  NodeB->>ChainB: add_blocks_bulk([B...])
  ChainB-->>NodeB: (success, added_count)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • StabilityNexus/MiniChain#54 — Modifies _listen_to_peer() and related peer metadata injection; overlaps with the message-context injection changes here.

Suggested labels

Python Lang

Suggested reviewers

  • Zahnentferner

Poem

🐰 I hopped across a chain so long,

I passed a status, short and strong,
I asked for blocks, they came in streams,
Now every node can share the dreams,
hop hop — synced, we stomp and sing!

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding a chain sync protocol for block history synchronization, which is the primary focus of the PR.
Linked Issues check ✅ Passed The PR implements all required components from issue #83: the 3-message protocol (status, get_blocks, blocks), height property, range retrieval, bulk add, message validators, and status exchange on peer connection.
Out of Scope Changes check ✅ Passed All code changes are scoped to the objectives in issue #83: minichain/chain.py (height, range retrieval, bulk add), minichain/p2p.py (new message types), and main.py (status exchange and handling).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@main.py`:
- Around line 372-378: Capture the chain tip once before sending any data so the
state snapshot and the status message use the same height: read chain.height
(and the sync snapshot value used earlier) into local variables (e.g.,
snapshot_height and status_height) before calling writer.write or awaiting
writer.drain, build the status_msg using that captured height instead of
accessing chain.height again, and then write/drain and log as before; update
references to status_msg, writer.write, writer.drain, and any earlier sync
variable to use these captured variables so both payloads reflect the same tip.
- Around line 159-201: The issue: importing historical blocks via the "blocks"
message calls chain.add_blocks_bulk(received) which does not replay
transactions, so local state can diverge unless the node's state already matches
the peer tip; update the code and chain logic so bulk imports produce a correct
state: either (A) make the initial sync path (the code path that calls sync and
merges accounts) perform an atomic state replace when accepting a peer snapshot
(ensure sync returns a flag and you call a state_replace() /
chain.replace_state_atomic(...) before applying blocks), or (B) change
add_blocks_bulk (and/or introduce add_blocks_and_replay or
rebuild_state_from_history) to rebuild state by replaying transactions from the
earliest imported block (i.e., derive balances/nonces by applying each block's
transactions rather than only appending block headers). Locate references to
add_blocks_bulk, sync, chain.get_blocks_range, and the "blocks" message handler
and implement one of these fixes so historical imports do not rely on
preexisting local state.

In `@minichain/chain.py`:
- Around line 100-113: The add_blocks_bulk function must be atomic: first
convert each block_dict to Block (using Block.from_dict) and validate linkage
sequentially against a temporary tip via validate_block_link_and_hash without
mutating self.chain or self.last_block; if any validation fails return (False,
0). Only after the entire batch validates, acquire self._lock and append all
validated Block instances to self.chain (updating added appropriately) so the
extension is done under the lock as one operation; reference add_blocks_bulk,
Block.from_dict, validate_block_link_and_hash, self.last_block, self.chain and
self._lock when making the change.

In `@minichain/p2p.py`:
- Around line 231-239: The _validate_blocks_payload currently only checks for
dicts but lets malformed block dicts (e.g. {"foo":1}) through; update
_validate_blocks_payload to iterate payload["blocks"] and for each entry ensure
it's a dict and contains the required keys expected by Block.from_dict (e.g. the
canonical block field names used by Block.from_dict or a Block.REQUIRED_FIELDS
constant if present) and basic types where applicable, returning False if any
block is missing keys or has wrong types; reference _validate_blocks_payload and
Block.from_dict when making the check so the P2P boundary rejects malformed
block entries before attempting to construct Block instances.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7fdf0b31-6a5d-423b-9fe5-258bd65e50cf

📥 Commits

Reviewing files that changed from the base of the PR and between 518c70a and 072d587.

📒 Files selected for processing (3)
  • main.py
  • minichain/chain.py
  • minichain/p2p.py

Comment thread main.py Outdated
Comment on lines +159 to +201
elif msg_type == "status":
import json as _json
peer_height = payload["height"]
my_height = chain.height

if peer_height > my_height:
writer = data.get("_writer")
if writer:
request = _json.dumps({
"type": "get_blocks",
"data": {
"from_height": my_height + 1,
"to_height": peer_height
}
}) + "\n"
writer.write(request.encode())
await writer.drain()
logger.info("📡 Requesting blocks %d~%d from %s",
my_height + 1, peer_height, peer_addr)
elif msg_type == "get_blocks":
import json as _json
from_h = payload["from_height"]
to_h = payload["to_height"]
blocks = chain.get_blocks_range(from_h, to_h)

writer = data.get("_writer")
if writer and blocks:
response = _json.dumps({
"type": "blocks",
"data": {"blocks": blocks}
}) + "\n"
writer.write(response.encode())
await writer.drain()
logger.info("📤 Sent %d blocks to %s", len(blocks), peer_addr)

elif msg_type == "blocks":
received = payload["blocks"]
success, count = chain.add_blocks_bulk(received)

if success:
logger.info("✅ Chain synced: added %d blocks", count)
else:
logger.warning("❌ Chain sync failed after %d blocks", count)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

History import assumes a state snapshot invariant that sync does not guarantee.

add_blocks_bulk() never replays transactions, so this path is only safe when the local state already matches the peer tip. But the sync branch above can reject the snapshot on Lines 120-122 and, when accepted, only merges missing accounts on Lines 130-136. The default --fund path already creates local state before connect, so a node can append historical blocks and still keep non-canonical balances/nonces. Either make initial sync replace local state atomically, or make bulk history import rebuild state instead of relying on sync.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main.py` around lines 159 - 201, The issue: importing historical blocks via
the "blocks" message calls chain.add_blocks_bulk(received) which does not replay
transactions, so local state can diverge unless the node's state already matches
the peer tip; update the code and chain logic so bulk imports produce a correct
state: either (A) make the initial sync path (the code path that calls sync and
merges accounts) perform an atomic state replace when accepting a peer snapshot
(ensure sync returns a flag and you call a state_replace() /
chain.replace_state_atomic(...) before applying blocks), or (B) change
add_blocks_bulk (and/or introduce add_blocks_and_replay or
rebuild_state_from_history) to rebuild state by replaying transactions from the
earliest imported block (i.e., derive balances/nonces by applying each block's
transactions rather than only appending block headers). Locate references to
add_blocks_bulk, sync, chain.get_blocks_range, and the "blocks" message handler
and implement one of these fixes so historical imports do not rely on
preexisting local state.

Comment thread main.py Outdated
Comment thread minichain/chain.py Outdated
Comment thread minichain/p2p.py Outdated
Copy link
Copy Markdown
Member

@SIDDHANTCOOKIE SIDDHANTCOOKIE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Zahnentferner I've reviewed and added some comments

Comment thread minichain/chain.py
self.chain.append(block)
return True

def get_blocks_range(self, from_height: int, to_height: int) -> list:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@0u-Y
There is no limit enforced on the number of blocks a peer can request at once. If a malicious peer connects and sends a get_blocks message with from_height: 1 and to_height: 10000000, this list comprehension will attempt to serialize millions of blocks into memory all at once. This will immediately exhaust the node's memory (OOM) and crash it. We can definitely enforce a hard limit on the batch size.

Comment thread minichain/chain.py Outdated
return [b.to_dict() for b in self.chain[from_height:to_height + 1]]

def add_blocks_bulk(self, block_dicts: list) -> tuple:
"""Add blocks validating chain linkage only. State relies on sync message."""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@0u-Y There is a fundamental issue with add_blocks_bulk. The method skips executing transactions against the state and explicitly mentions State relies on sync message. In a blockchain, the state must be deterministically derived from the blocks.

If we decouple block synchronization from state execution:

  • A malicious peer can send a chain of valid-looking blocks with fake/invalid transactions, and our node will accept them.
  • State Mismatch: The node will accept a state via a sync message without verifying that the blocks actually produce that state.
    I feel add_blocks_bulk must fully validate each block's transactions against the current state (similar to standard add_block) rather than just validating the block link and hash.

Comment thread minichain/chain.py Outdated
added = 0
for block_dict in block_dicts:
block = Block.from_dict(block_dict)
with self._lock:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here lock is acquired and released on every iteration. Between iterations another thread can append a block, making self.last_block stale for the next call to validate_block_link_and_hash. Move the with self._lock: outside the loop.

Comment thread minichain/p2p.py
def _validate_status_payload(self, payload):
if not isinstance(payload, dict):
return False
if set(payload) != {"height"}:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the new payload validators repeat the exact same type-checking logic, specifically checking if the payload is a dictionary and validating the exact keys using set(payload) != {...}. While I see this follows the pattern of older validators in the file, it introduces unnecessary boilerplate.

You can consider creating a generic helper method to reduce the duplication, or simply inline the dict check if the logic is very simple

Comment thread main.py Outdated
"type": "get_blocks",
"data": {
"from_height": my_height + 1,
"to_height": peer_height
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"to_height": peer_height - if a peer lies about its height (e.g. claims height 999999), we request all of it with no cap. The review only flags the responder side in p2p.py, but the requester side in main.py is equally vulnerable.

Comment thread main.py
to_h = payload["to_height"]
blocks = chain.get_blocks_range(from_h, to_h)

writer = data.get("_writer")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Injecting _writer into the message payload and directly calling writer.write(...) in main.py breaks the abstraction of the P2P layer. If we ever switch the transport layer (e.g., from raw sockets to WebSockets or a library), we will have to rewrite the application logic in main.py.
Instead of passing the raw StreamWriter to main.py, consider adding a send_message_to_peer(peer_addr, msg_type, payload) method directly on the P2PNetwork class, and invoke that from main.py.

@0u-Y 0u-Y force-pushed the feature/chain-sync branch from 072d587 to 3ce575c Compare May 19, 2026 16:22
@0u-Y
Copy link
Copy Markdown
Author

0u-Y commented May 19, 2026

Pushed fixes:

  • add_blocks_bulk now mirrors add_block's validate-and-apply pattern (single temp state, atomic commit) — closes tx-execution, partial-rollback, and lock-in-loop.
  • MAX_BLOCKS_PER_REQUEST = 500 cap on both responder and requester sides.
  • New chain.snapshot_state_and_height() so sync and status come from the same tip.
  • _validate_blocks_payload validates each entry via _validate_block_payload and rejects oversized batches.

Deferred: P2PNetwork.send_message_to_peer() refactor, validator boilerplate consolidation.

41 tests pass. @SIDDHANTCOOKIE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Chain Sync Block History Synchronization for New Nodes

2 participants